How to protect data at an enterprise level
Posted: November 21, 2024
One of the reasons that Artificial Intelligence (AI) is on the tip of privacy professionals’ tongues lies in the fact that AI is so thirsty for data – and needs lots of it. The main engine behind AI’s enormous potential benefits is its ability to sift through enormous amounts of data, even dissimilar data, and come back with a proposed answer. That is, a primary raison d’etre for AI is making sense of enormous data sets – which is an activity that makes a privacy pro’s toes curl.
Also, some of the most useful AI programs need to learn to be useful, and AI learning requires training data. Training data can come from a variety of sources, and in some cases can even be synthetic data (which also can be created by generative AI programs). Regardless of the source of the training data, AI needs a lot of data to learn and so refine its operations.
This means that companies have an increasing incentive to maintain and use an endless sea of data, including personal data – so that they can leverage AI in an increasingly fast marketplace race to competitive advantage, and to train those same AI systems to be most accurate and complete. The push-pull between business pressures and privacy concerns have to potential to increase as well, as privacy risk rises, and Big Data becomes even more central to business strategy. However, with some planning and forethought, privacy and business data goals can still run side by side in the same race to success. Here are some thoughts on how to ensure enterprise data privacy and data-hungry business goals can both exist at the same time.
Know the data – and clean it up!
Whether large or small, organizations that know their data well, including the consents and preferences that apply to each data field-use combination for each data subject, will have the most latitude, flexibility, and confidence in their data use practices. Admittedly, this is easier said than done, especially where the organization in question has a large amount of legacy data collected over time through variable practices. However, the exercise of cleaning up the data sets in a way that builds data confidence – and compliance – is time well spent.
The first step to any good data clean-up effort is to build an understanding of 1) who the data is about, 2) where the data came from, and 3) what data use rights apply.
This requires a careful analysis of each record to determine the data subject’s identity, what data collection experience the organization presented, and so what data rights follow. Be prepared: It is common for older data identity and rights to be more ambiguous, and so the organization may need to make the hard decision to downgrade the allowable uses for some sets of data, or even may need to delete some portion of the database.
After the organization has some confidence in its data identity, origin, and allowable uses, there are some data hygiene steps to consider:
- Remove duplicates – Duplicates just increase risk without reward and complicate databases unnecessarily.
- Remove unnecessary data – Applying the minimum necessary rule, which is a requirement in many jurisdictions, will also simplify the database. Also, if the organization collected data that it determines it can no longer use due to consent/preference reasons, it can delete these data at the same time. Like spring cleaning, it can be hard to stick to the rigor of getting rid of anything the organization is not using (because bell-bottom pants will come back into style eventually). However, deleting unnecessary data will enhance rather than hinder data-driven efforts.
- Standardize data – Legacy data sets are notorious for applying different formatting and rules. Take the quite simple example of name – which one database can capture as First Name and Last Name and Middle Initial, while another database may capture as First_Last Name, and still another as Suff, LastName, and First_Initial. Translating these variations into a standard set will help not only smooth operations on these data, but also help set more consistent standards for future data collection in the form of a data catalogue or data dictionary.
- Confirm accuracy – There are many reasons why data can be inaccurate. Phone numbers change, people provide inaccurate information to protect their privacy, and people may just fat-finger data they enter. Once an organization dedupes, deletes, and standardizes, a next logical step is to make sure that the remaining data are accurate, complete, and timely.
Know the tools and vendors
Tools and vendors that operate on an organization’s data have the potential to be both the largest risk, and the largest privacy enabler. For example, it is possible for AI tools to keep and reproduce data across clients, or even within a company return data that one user with more expansive privileges to another user without those same privileges.
Also, vendors may build into their contracts the right to use customer company’s data to build models for other clients. With these risks in mind, it is critical to deeply understand any tools operating on data, including the safeguards in place to prevent access control discrepancies and cross-company data leakages. Even more importantly, laser-focus on vendor contracts related to privacy, security, and data rights is essential.
There is also a strong positive potential for tools and vendors to enhance privacy in large data sets. For example, technology can help deidentify data to reduce privacy risk during use. There are techniques, such as homomorphic encryption, that allow for operations on data while they are still encrypted. There are privacy preserving machine learning techniques that help maintain confidentiality. Technology can also aid in maintaining strong, granular access controls to data and maintain a robust governance framework to apply rules to data.
Educate and verify
In the world of machine learning, it can be easy to be caught up in the AI excitement and forget about the most important type of learning – human learning. People are still the critical component in data use and privacy, and human education about data, privacy, and security is still paramount to success in the digital world. Strong education programs across multiple roles in an organization will help reduce human errors, give workers the knowledge they need to ask the right questions, and give people the confidence to act (in the right way).
No program is complete without regular reviews and monitoring, however. Consider establishing verification steps to confirm that vendors, technologies, workers, and data practices are all working according to plan.
Summary
Protecting data at an enterprise level can seem challenging, especially in this world of Big Data that drives increasingly important business decisions. However, a step-by-step approach can help ensure that privacy and business goals march step-by-step in the same direction.
A company that first reviews its data and takes needed data hygiene action will have a clear, clean, and organized data structure to support sound business decisions and privacy compliance. Then, if the company also takes the time to understand the technology and the vendors that provide that technology, it can take advantage of the incredibly useful tools out there to support privacy and business goals simultaneously, while avoiding pitfalls related to unanticipated vendor data uses and even data breaches.
Moreover, when the whole organization has an appropriate, role-specific understanding of privacy and data concepts, expectations, and safeguards, every worker can move forward with confidence and clarity.
Finally, with a ‘trust but verify’ mentality supported by regular reviews and audits of data practices, vendors, and technologies, the organization will have completed that compliance and data effectiveness circle. This will mean that the organization itself has the confidence to use data in interesting and valuable ways.
Privacy pitfalls: Mitigate risk with Consent and Preference Management
When it comes to the implementation of a privacy framework, professionals oftentimes encounter challenges along the way.
Our latest guide addresses common pitfalls to implementing a privacy program, including:
- Understanding the role of consent
- Measures to stay compliant with data regulations
- Maintaining transparency in data privacy practices
- How Consent and Preference Management can be utilized